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Abstract. In this paper wc look for conditions that are sufficient to guarantee that a 
subset A of a finite Abehan group G contains the "expected" number of hnear configura- 
tions of a given type. The simplest non-trivial result of this kind is the well-known fact 
that if G has odd order, A has density a and all Fourier coefficients of the characteris- 
tic function of A are significantly smaller than a (except the one at zero, which equals 
a), then A contains approximately a'^jGp triples of the form (a, a + d, a + 2d). This is 
"expected" in the sense that a random set A of density a has approximately a'^IGp such 
triples with very high probability. 

More generally, it was shown in |Gow01| (in the case G = Zn for N prime, but the 
proof generalizes) that a set A of density a has about a'^jCp arithmetic progressions of 
length k if the characteristic function of A is almost as small as it can be, given its density, 
in a norm that is now called the C/'^~^-norm. Green and Tao [GT06| have found the most 
general statement that follows from the technique used to prove this result, introducing 
a notion that they call the complexity of a system of linear forms. They prove that if A 
has almost minimal C/'^+^-norm then it has the expected number of linear configurations 
of a given type, provided that the associated complexity is at most k. The main result 
of this paper is that the converse is not true: in particular there are certain systems of 
complexity 2 that are controlled by the t/^-norm, whereas the result of Green and Tao 
requires the stronger hypothesis of t/ '^-control. 

We say that a system of m linear forms Li, . . . , Lm in d variables has true complexity 
fc if /c is the smallest positive integer such that, for any set A of density a and almost 
minimal C/'^+^-norm, the number of d-tuplcs (xi, . . . , Xd) such that Li{xi, . . . , Xd) G A for 
every i is approximately q;™|G'|'^. We conjecture that the true complexity k is the smallest 
positive integer s for which the functions i'J^^, . . . , L!^^ are linearly independent. Using 
the "quadratic Fourier analysis" of Green and Tao we prove this conjecture in the case 
where the complexity of the system (in Green and Tao's sense) is 2, s = 1 and G is the 
group Fp for some fixed odd prime p. 

A closely related result in ergodic theory was recently proved independently by Leibman 
|Lei07| . We end the paper by discussing the connections between his result and ours. 
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1. Introduction 

Let y4 be a subset of a finite Abelian group G and let a = \A\/\G\ be the density of 
A. We say that A is uniform if it has one of several equivalent properties, each of which 
says in its own way that A "behaves like a random set". For example, writing A for the 
characteristic function of the set A, we can define the convolution A* Ahj the formula 

A*A{x) =Ey+,=^A{y)A{z), 

where the expectation is with respect to the uniform distribution over all pairs {y, z) G 
such that y + z = x; one of the properties in question is that the variance of A* A should 
be small. If this is the case and G has odd order, then it is easy to show that A contains 
approximately a'^IGp triples of the form {x,x + d,x + 2d). Indeed, these triples are the 
solutions {x, y, z) of the equation x + z = 2y, and 

E,+,=2yA{x)A{y)A{z) = EyA * A{2y)A{y). 

The mean of the function A * A is a^, so if the variance is sufficiently small, then the 
right-hand side is approximately a'^KyA{y) = a^. This is a probabilistic way of saying 
that the number of solutions of x + z = 2y inside A is approximately a'^IGp, which is 
what we would expect if A was a random set with elements chosen independently with 
probability a. 

An easy generalization of the above argument shows that, given any linear equation in 
G of the form 

CiXi + C2X2 H h CmXm. = 0, 

for suitable fixed coefficients Ci,C2, ■■■,Cm, the number of solutions in A is approximately 
a™|G*|™'~^. Roughly speaking, you can choose X3, . . . , Xm in A however you like, and if A is 
sufficiently uniform then the number of ways of choosing Xi and X2 to lie in A and satisfy 
the equation will almost always be roughly a;^|G|. By "suitable" we mean that there are 
certain divisibility problems that must be avoided. For example, if G is the group Fg, 
X + z = 2y and x belongs to A, then z belongs to A for the trivial reason that it equals 
X. Throughout this paper we shall consider groups of the form for some prime p and 
assume that p is large enough for such problems not to arise. 

When k > 4, uniformity of a set A does not guarantee that A contains approximately 
a'^IGp arithmetic progressions of length k. For instance, there are examples of uniform 
subsets of Ztv that contain significantly more, or even significantly fewer than, the expected 
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number of four-term progressions jGowOGj . It was established in |Gow98j that the appro- 
priate measure for deahng with progressions of length 4 is a property known as quadratic 
uniformity: sets which are sufficiently quadratically uniform contain roughly the correct 
number of four-term progressions. We shall give precise definitions of higher-degree uni- 
formity in the next section, but for now let us simply state the result, proved in [GowOlj 
in the case G = Z^r, that if A is uniform of degree k — 2, then A contains approximately 
a'^'IGp arithmetic progressions of length k. Moreover, if A is uniform of degree j for some 
j < k — 2, then it does not follow that A must contain approximately a'^'IGp arithmetic 
progressions of length k. 

The discrepancy between k and k — 2 seems slightly unnatural until one reformulates 
the statement in terms of solutions of equations. We can define an arithmetic progression 
of length k either as a /c-tuple of the form {x,x + d, . . . ,x + {k — l)d) or as a solution 
(xi, X2, . . . , Xk) to the system of A; — 2 equations Xi — 2xi+i + Xi+2 = 0, i = 1,2, . . . , k — 2. In 
all the examples we have so far discussed, we need uniformity of degree precisely k in order 
to guarantee approximately the expected number of solutions of a system of k equations. 
It is tempting to ask whether this is true in general. 

However, a moment's refiection shows that it is not. For example, the system of equations 
Xi — 2x2 + X3 = 0, X4 — 2x5 + xq = has about a^lGj^ solutions in a uniform set, since 
the two equations are completely independent. This shows that a sensible conjecture must 
take account of how the equations interact with each other. 

A more interesting example is the system that consists of the (™) equations Xij+Xjk = Xik 
in the (™) unknowns Xij, 1 < i < j < m. These equations are not all independent, but 
one can of course choose an independent subsystem of them. It is not hard to see that 
there is a bijection between solutions of this system of equations where every Xij belongs 
to A and m-tuples (xi, . . . ,Xm) such that Xj — Xi & A whenever i < j. Now one can 
form a bipartite graph with two vertex sets equal to G by joining x to y if and only if 
y — X E A. It is well-known that if A is uniform, then this bipartite graph is quasirandom. 
The statement that every xj — Xj belongs to A can be reformulated to say that (xi, . . . , Xm) 
form a clique in an m-partite graph that is built out of quasirandom pieces derived from A. 
A "counting lemma" from the theory of quasirandom graphs then implies easily that the 
number of such cliques is approximately a^^^\G\"^. So uniformity of degree 1 is sufficient to 
guarantee that there are about the expected number of solutions to this fairly complicated 
system of equations. 
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In their recent work on configurations in the primes, Green and Tao |GT06] analysed 
the arguments used to prove the above resuhs, which are fairly simple and based on 
repeated use of the Cauchy-Schwarz inequality. They isolated the property that a system 
of equations, or equivalently a system of linear forms, must have in order for degree-fc 
uniformity to be sufficient for these arguments to work, and called this property complexity. 
Since in this paper we shall have more than one notion of complexity, we shall sometimes 
call their notion Cauchy-Schwarz complexity, or CS-complexity for short. 

Definition 1.1. Let C = [Li, Lm) be a system of m linear forms in d variables. For 
1 < i < m and s > 0, we say that C is s-complex at i if one can partition the m — 1 forms 
{Lj '■ j ^ i} into s + 1 classes such that Lj does not lie in the linear span of any of these 
classes. The Cauchy-Schwarz complexity (or CS-complexity) of C is defined to be the least 
s for which the system is s-complex at i for all 1 < i < m, or oo if no such s exists. 

To get a feel for this definition, let us calculate the complexity of the system C oik linear 
forms x,x + y, . . . ,x + {k — l)y. Any two distinct forms x + iy and x + jy in C contain 
X and y in their linear span. Therefore, whichever form L we take from £, if we wish to 
partition the others into classes that do not contain L in their linear span, then we must 
take these classes to be singletons. Since we are partitioning k — 1 forms, this tells us that 
the minimal s is /c — 2. So £ has complexity k — 2. 

Next, let us briefiy look at the system C of (™) forms Xi — Xj (1 < i < j < m) that 
we discussed above. If L is the form Xi — Xj then no other form L' E C involves both Xi 
and Xj, so we can partition C \ {L} into the forms that involve Xi (which therefore do not 
involve Xj) and the forms that do not involve Xi. Since neither class includes L in its linear 
span, the complexity of C is at most 1. When m > 3 it can also be shown to be at least 1. 

It follows from Green and Tao's result that if A is sufficiently uniform and C = (Li, L^) 
has complexity at most 1, then A contains approximately the expected number of m-tuples 
of the form (Li(xi, . . . , Xa), ■ ■ ■ , L^ixi, . . . xa))- (If the forms are defined over Z'^, then this 
number is 

Notice that this statement adequately explains all the cases we have so far looked at 
in which uniformity implies the correct number of solutions. It is thus quite natural to 
conjecture that Green and Tao's result is tight. That is, one might guess that if the 
complexity C is greater than 1 then there exist sets A that do not have the correct number 
of images of C. 

But is this correct? Let us look at what is known in the other direction, by discussing 
briefiy the simplest example that shows that uniform sets in Z^r do not have to contain 
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the correct number of arithmetic progressions of length 4. (Here we are taking to be 
some large prime.) Roughly speaking, one takes A to be the set of all x such that mod 
is small. Then one makes use of the identity 

x^ - 3{x + df + 3(x + 2df - (x + Uf = 

to prove that ii x, x + d and x + 2d all lie in A, then x + 3d is rather likely to lie in A as 
well, because {x + Sd)"^ is a small linear combination of small elements oi Z^. This means 
that A has "too many" progressions of length 4. (Later, we shall generalize this example 
and make it more precise.) 

The above argument uses the fact that the squares of the linear forms x, x + d, x + 2d 
and X + 3d are linearly dependent. Later, we shall show that if C is any system of linear 
forms whose squares are linearly dependent, then essentially the same example works for 
C. This gives us a sort of "upper bound" for the set of systems C that have approximately 
the right number of images in any uniform set: because of the above example, we know 
that the squares of the forms in any such system C must be linearly independent. 

And now we arrive at the observation that motivated this paper: the "upper bound" 
just described does not coincide with the "lower bound" of Green and Tao. That is, there 
are systems of linear forms of complexity greater than 1 with squares that are linearly 
independent. One of the simplest examples is the system {x,y, z,x + y + z,x + 2y — z,x + 
2z — y). Another, which is translation- invariant (in the sense that if you add a constant 
to everything in the configuration, you obtain another configuration of the same type), is 
{x,x + y,x + z,x + y + z,x + y — z,x + z — y). A third and rather natural example that is 
also translation-invariant is the configuration 

{x^x + a,x + h,x + c,x + a + h,x + a + c,x + h + c), 

which can be thought of as a cube minus a point. All these examples have complexity 2, 
but it is not hard to produce examples with arbitrarily high complexity. 

In the light of such examples, we are faced with an obvious question: which systems 
of linear forms have roughly the expected number of images in any sufficiently uniform 
set? We conjecture that the correct answer is given by the "upper bound" — that is, that 
square independence is not just necessary but also sufficient. When the group G is for 
a fixed prime p, we prove this conjecture for systems of complexity 2. This includes the 
two examples above, and shows that having Cauchy-Schwarz complexity at most 1 is not 
a necessary condition, even if it is a natural sufficient one. 
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However, the proof is much deeper for systems of complexity 2. Although the statement 
of our result is completely linear, we use "quadratic Fourier analysis", recently developed 
by Green and Tao jGTOSaj . to prove it, and it seems that we are forced to do so. Thus, it 
appears that Cauchy-Schwarz complexity captures the systems for which an easy argument 
exists, while square independence captures the systems for which the result is true. 

Very recently, and independently, Leibman [Lei07j described a similar phenomenon in 
the ergodic-theoretic context. In the final section of the paper we shall briefly outline how 
his results relate to ours. 

So far, we have concentrated on uniform sets. However, in the next section we shall define 
higher-degree uniformity and formulate a more complete conjecture, which generalizes the 
above discussion in a straightforward way. Green and Tao proved that a system of Cauchy- 
Schwarz complexity k has approximately the correct number of images in a set A if A is 
sufficiently uniform of degree k + 1. Once again, it seems that this is not the whole story, 
and that the following stronger statement should be true: a linear system C = (Li, . . . , L^) 
has the right number of images in any set A that is sufficiently uniform of degree k if and 
only if the functions L^^^ are linearly independent. The reason we have not proved this 
is that the natural generalization of our existing argument would have to use an as yet 
undeveloped general "polynomial Fourier analysis" , which is known only in the quadratic 
case. However, it is easy to see how our arguments would generalize if such techniques were 
available, which is compelling evidence that our conjecture (which we will state formally 
in a moment) is true. 

2. Uniformity norms and true complexity 

As promised, let us now give a precise definition of higher-degree uniformity. We begin 
by defining a sequence of norms, known as uniformity norms. 

Definition 2.1. Let G be a finite Abelian group. For any positive integer k > 2 and any 
function f : G ^ C, define the U^-noTm by the formula 

\\f\\t:=E,M,.MG n C'l^'/l^ + ^-h), 

t^G{0,l}''- 

where uj -h is shorthand for '^^cuihi, and C''^'/ = / if^^uJi is even and f otherwise. 

These norms were first defined in |Gow01j (in the case where G is the group Z^r). Of 
particular interest in this paper will be the f/^-norm and the [/"^-norm. The former can be 
described in many different ways. The definition above expresses it as the fourth root of 
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the average of 

f{x)f{x + h)f{x + h')f{x + h + h') 

over all triples (x, h, h'). It is not hard to show that this average is equal to ||/ * /II2, and 
also to II/II4. (These identities depend on appropriate normalizations — we follow the most 
commonly used convention of taking averages in physical space and sums in frequency 
space.) 

We shall call a function / c-uniform if ||/||i72 < c and c-quadratically uniform if ||/||(73 < 
c. We shall often speak more loosely and describe a function as uniform if it is c-uniform 
for some small c, and similarly for higher-degree uniformity. We remark here that ii j <k 
then ll/llc/j < ll/lliyfc, so c-uniformity of degree k implies c- uniformity of all lower degrees. 

If A is a subset of an Abelian group G and the density of A is a, then we say that A is 
uniform of degree k if it is close in the f/'^-norm to the constant function a. More precisely, 
we define the balanced function f{x) = A{x) — a and say that A is c-uniform of degree k 
ii\\f\\u'^<c. 

The following theorem is essentially Theorem 3.2 in jGowOlj . (More precisely, in that 
paper the theorem was proved for the group Z^r, but the proof is the same.) 

Theorem 2.2. Let k > 2 and let G be a finite Abelian group such that there are no non- 
trivial solutions to the equation jx = for any 1 < j < k. Let c > and let /i, /2, • • • , /fc 
be functions from G to C such that ||/i||oo < 1 for every i. Then 

E.,,eG/i(x)/2(a; + y)... fk{x + {k - l)y)\ < \\fk\\u>=-i. 

It follows easily from this result that if A is a set of density a and A is c-uniform for 
sufficiently small c, then A contains approximately a'^IGp arithmetic progressions of length 
k. Very briefly, the reason for this is that we are trying to show that the average 

E,^yA{x)A{x + y)...A{x + {k- l)y) 

is close to a''. Now this average is equal to 

E^,yA{x)A{x + y) ...f{x + {k- l)y) + aE^^yA{x)A{x + y) . . . A{x + {k ~ 2)y). 

The first of these terms is at most c, by Theorem 12.21 and the second can be handled 
inductively. The bound we obtain in this way is c(l + a + ■ ■ ■ + a'^^^) < kc. 

We can now state formally Green and Tao's generalization in terms of CS-complexity in 
the case where G is the group Z^r, which is implicit in |GT06j . 
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Theorem 2.3. Let N be a prime, let fi, . . . , fm be functions from Z^r to [—1,1], and let C 
be a linear system of CS- complexity k consisting ofm forms in d variables. Then, provided 



Just as in the case of arithmetic progressions, it follows easily that if A is a subset of G 
of density a, then the probability, given a random element (xi, G G"^, that all the 

m images Li{xi, ...,Xd) lie in A is approximately a™. (The inductive argument depends on 
the obvious fact that if C has complexity at most k then so does any subsystem of C.) 

Green and Tao proved the above theorem because they were investigating which linear 
configurations can be found in the primes. For that purpose, they in fact needed a more 
sophisticated "relative" version of the statement. Since the proof of the version we need 
here is simpler (partly because we are discussing systems of complexity at most 2, but 
much more because we do not need a relative version), we give it for the convenience of 
the reader. This is another result where the proof is essentially the same for all Abelian 
groups, give or take questions of small torsion. Since we need it in the case G = F^, we 
shall just prove it for this group. The reader should bear in mind that for this group, one 
should understand linear independence of a system of forms as independence over Fp when 
one is defining complexity (and also square-independence). 

The first step of Green and Tao's proof was to put an arbitrary linear system into a 
convenient form for proofs. Given a linear form L in d variables xi, . . . ,Xd, let us define 
the support of L to be the set of j such that L depends on Xj. That is, if L{xi, . . . , Xd) = 
XiXi + ■ ■ ■ + XdXd then the support of L is {z : Aj 7^ 0}. Let C = (Li, . . . , L^) be a system 
of linear forms and let the support of Lj be (Tj for each i. Then C is said to be in s- normal 
form if it is possible to find subsets Tj C cTj for each i with the following two properties. 

(i) Each Tj has cardinality at most s + 1. 

(ii) If i 7^ j then is not a subset of aj. 

If a linear system C is in s-normal form, then it has complexity at most s. Indeed, 
if Ti has r elements {ii, . . . , v}, then one can partition the remaining forms into r sets 
Ci, . . . ,Cr in such a way that no form in uses the variable Xj^. Since Lj does use the 
variable it is not in the linear span of Ch- 

The converse of this statement is false, but Green and Tao prove that every linear system 
of complexity s can be "extended" to one that is in s-normal form. This part of the proof 
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is the same in both contexts, so we do not reproduce it. All we need to know here is that 
if we prove Theorem 12.31 for systems in normal form then we have it for general systems. 

Just to illustrate this, consider the obvious system associated with arithmetic progres- 
sions of length 4, namely {x,x + y,x + 2y, x + 3y). This is not in 2- normal form, because 
the support of the first form is contained in the supports of the other three. However, the 
system (—3a: — 2y — z, —2x — y + w, —x + z + 2w, y + 2z + 3w) is in 2-normal form (since the 
supports have size 3 and are distinct) and its images are also uniformly distributed over 
all arithmetic progressions of length 4 (if we include degenerate ones). 

Now let us prove Theorem 12.31 when k = 2. Without loss of generality we may assume 
that C is in 2-normal form at 1, and that it is the only form using all three variables 
xi = x,X2 = y and X3 = z. We use the shorthand h{x,y,z) = /(Li(xi, X2, x^)), and 
denote by b{x, y) any general bounded function in two variables x and y. It is then possible 
to rewrite 

m 

Exi,...,XdGF^ Yl f{Li{Xi, ...,Xd)) 
i=l 

as 

^xi,x5,...,xa^x,y,zh{x, y, z)b{x, y)b{y, z)b{x, z). 

Here, the functions h and b depend on the variables X4, . . . , a:^ but we are suppressing this 
dependence in the notation. 

Estimating the expectation over (x, z) is a well-known argument from the theory of 
quasirandom hypergraphs. (See for instance Theorem 4.1 in [Gow04j .) First, we apply 
Cauchy-Schwarz and use the boundedness of b to obtain an upper bound of 

(E,,,(E,/i(x, z)b{x, z)b{y, z)fY/\ 

Expanding out the square and rearranging yields 

{^y,z,z'b{y, z)b{y, z')E,^h{x, y, z)h{x, y, z')b{x, z)b{x, 2'))^^^ 

and by a second application of Cauchy-Schwarz we obtain an upper bound of 

(E,,,,,,(E,/i(x, y, z)h{x, y, z')b{x, z)b{x, z')fY/\ 

A second round of interchanging summation followed by a third application of Cauchy- 
Schwarz gives us an upper bound of 

iE^,x',z,z'i^yhix, y, z)h{x, y, z')h{x\ y, z)h{x', y, z'))^)^/^. 
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This expression equals the "octahedral norm" of the function h{x, y, z) — a hypergraph ana- 
logue of the ?7^-norm. Because for fixed X4, . . . , Xd, h depends only on the linear expression 
Li{x,y,z), a simple change of variables can be used to show that it is in fact equal to 

WfWm- 

Now all that remains is to take the expectation over the remaining variables and the 
proof is complete. It is also not hard to generalize to arbitrary k, but this we leave as an 
exercise to the reader. 

Now, as we stated earlier, Theorem 12.31 does not settle the question of which systems are 
controlled by which degrees of uniformity. Accordingly, we make the following definition. 

Definition 2.4. Let C be a system ofm distinct linear forms Li, L2, . . . , in d variables. 
The true complexity of C is the smallest k with the following property. For every e > 
there exists 6 > such that if G is any finite Abelian group and f : G ^ C is any function 
with ll/lloo < 1 dnd ll/llt/fe+i < S, then 



i=l 



< e. 



The main conjecture of this paper is now simple to state precisely. 

Conjecture 2.5. The true complexity of a system of linear forms C = [Li, . . . , Lm) is 
equal to the smallest k such that the functions L^^^ are linearly independent. 

In the next section, we shall prove this conjecture in the simplest case that is not covered 
by the result of Green and Tao, namely the case when k = 1 and C has CS-complexity 2. 
All other cases would require a more advanced form of polynomial Fourier analysis than 
the quadratic Fourier analysis that is so far known, but we shall explain why it will almost 
certainly be possible to generalize our argument once such a theory is developed. 

3. True complexity for vector spaces over finite fields 

We shall now follow the course that is strongly advocated by Green |Gre05a] and restrict 
attention to the case where G is the group F^, where p is a fixed prime and n tends to 
infinity. The reason for this is that it makes many arguments technically simpler than they 
are for groups with large torsion such as Zat. In particular, one can avoid the technicalities 
associated with Bohr sets. These arguments can then almost always be converted into 
more complicated arguments for Z^v- (In a forthcoming paper, we give a different proof 
for the case and carry out the conversion process. That proof is harder than the proof 
here but gives significantly better bounds and is easier to convert.) 
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We begin this section with the easier half of our argument, showing that if £ is a system 
of hnear forms (Li, . . . ,Lm) and if there is a hnear dependence between the squares of 
these forms, then the true complexity of C is greater than 1. This part can be proved 
almost as easily for Z^r, but we shall not do so here. 

3.1. Square-independence is necessary. Let us start by briefly clarifying what we 
mean by square-independence of a linear system £ = {Li, . . . , Lm). When the group 
G is Ztv, then all we mean is that the functions Lf are linearly independent, but when 
it is Fp, then this definition does not make sense any more. Instead, we ask for the 
quadratic forms LfLi to be linearly independent. If Lj(xi, . . . , x^) = ^^'jr^Xr, then 
LfLi{xi, . . . ,Xd) = Xlr Xls 7^*^7s*^^r3;s- Therefore, what we are interested in is linear in- 
dependence of the matrices Tr} = '-fr^'-fi''^ over Fp. (Note that in the case of Z^, this is 
equivalent to independence of the functions Lf.) 

Theorem 3.1. Let C = (Li, . . . , L^) be a system of linear forms in d variables and suppose 
that the quadratic forms LjLi are linearly dependent overWp. Then there exists e > such 
that for every 6 > there exists n and a set A C with the following two properties, 
(i) A is 6-uniform of degree 1. 

(a) Ifx. = (xi, . . . ,Xd) is chosen randomly from (PpY, then the probability that Lii^x) is 
in A for every i is at least + e, where a is the density of A. 
In other words, the true complexity of L is at least 2. 

For the proof we require the following standard lemma, which says that certain Gauss 
sums are small. A proof can be found in [GreOSb] . for example. 

Lemma 3.2. Suppose that g : F^ — > Fp zs a quadratic form of rank r. That is, suppose 
that q{x) = x'^Mx + b'^x for some matrix M of rank r and some vector 6 G F^ . Then 




r/2 



with equality if b = 0. In particular. 




< p 



■n/2 



for any non-zero rj 



Proof of Theorem \3.1[ Let A be the set {x G F^ : x'^x = 0}. Then the characteristic 



function of A can be written as 



A{x) = 
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where uj = exp{27ri/p) and the expectation is taken over Fp. Let us now take any square- 
independent system C = (Li, . . . , Lm) of hnear forms in x = (xi, . . . , Xd) and estimate the 
expectation Ex Yli ^(-^i(x)). 

Using the formula for A{x), we can rewrite this expectation as 

We can break this up into p"^ expectations over x, one for each choice of ui, . . . , Um- 

If the Ui are all zero, then the expectation over x is just the expectation of the constant 
function 1, so it is 1. Otherwise, since the quadratic forms LfLi are linearly independent, 
the sum ^ . UjLj(x)-'"Lj(x) is a non-zero quadratic form q{x) = jlijxjxj. 

Without loss of generality, there exists j such that 7ij 7^ 0. If in addition 711 = 0, then 
for every choice of ^2, . . . , a;^ we can write q{x) in the form r^xi + z, where r = '^ijXj 
and z depends on X2, . . . , only. This is a non-constant linear function of Xi except when 
^iJ^J ~ Si^c^ every 71^ is zero, this happens with probability p""". Therefore, 
I ExCj'^*-^'' I < p"" in this case. If 711 7^ 0, then this same function has the form 'juxjxi+r'^xi 
for some element r G (which depends on X2, . . . ,Xd)- In this case. Lemma (3.21 implies 
that the expectation is at most 

Since the probability that Ui = ■ ■ ■ = Um = is p^"^, this shows that 



ExG(F^)d JJ^(i^i(x)) -p~ 



Applying this result in the case where C consists of the single form x, we see that 
the density of A differs from p~^ by at most Therefore, we have shown that for 

this particular set A, square-independence of C guarantees approximately the "correct" 
probability that every Lj(x) lies in A. 

This may seem like the opposite of what we were trying to prove, but in fact we have 
almost finished, for the following simple reason. If we now take C to be an arbitrary system 
(Li, . . . , of linear forms, then we can choose from it a maximal square- independent 
subsystem. Without loss of generality this subsystem is (Li, . . . ,Li). Then all the quadratic 
forms LfLi with i > I are linear combinations of LjLi, . . . , LfLi, so a sufficient condition 
for every LjLi{x.) to be zero is that it is zero for every i < I. But this we know happens with 
probability approximately p^' by what we have just proved. Therefore, if C is not square- 
independent, then A"^ contains "too many" m-tuples of the form (Li(x), . . . , Lm(x)). □ 



3.2. A review of quadratic Fourier analysis. We shall now turn our attention to the 
main result of this paper, which states that if C has CS-complexity at most 2 and is 
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square- independent, then the true complexity of £ is at most 1. We begin with a quick 
review of quadratic Fourier analysis for functions defined on F^. Our aim in this review 
is to give precise statements of the results that we use in our proof. The reader who is 
prepared to use quadratic Fourier analysis as a black box should then find that this paper 
is self-contained. 

So far in our discussion of uniformity, we have made no mention of Fourier analysis at 
all. However, at least for the f/^-norm, there is a close connection. Let / be a complex- 
valued function defined on a finite Abelian group G. If x is a character on G, the Fourier 
coefficient /(x) is defined to be '&xf{x)x{x). The resulting Fourier transform satisfies the 
convolution identity f * g = fg, Parseval's identity II/II2 = II/II2 and the inversion formula 
/(^) = J2xf(^)^(~^)- (The second and third identities depend on the correct choice of 
normalization: ||/||2 is defined to be Ea;|/(x)p, whereas ||/||2 is defined to be 
That is, as mentioned earlier, we take averages in G and sums in G.) It follows that 
11/11^2 = ll/llt since both are equal to ||/ * fWl 

It is often useful to split a function / up into a "structured" part and a uniform part. 
One way of doing this is to let K be the set of all characters x for which \f{x) \ is larger than 
some 6 and to write / = /i + /2, where fi = ExeA" fix)xi-x) and /a = E^^ir fix)xi-x). 
If ll/lloo < 1, (as it is in many applications), then Parseval's identity implies that \K\ < 6~^, 
and can also be used to show that ||/2||(72 < S^^"^- That is, K is not too large, and /2 is 
5^/^-uniform. 

T 

When G is the group F^, the characters all have the form x t-^ uj^ ^ . Notice that this 
character is constant on all sets of the form {x : r'^x = u}, and that these sets partition 
Fp into p affine subspaces of codimension 1. Therefore, one can partition F^ into at most 
pl^l affine subspaces of codimension \K\ such that fi is constant on each of them. This is 
the sense in which fi is "highly structured" . 

The basic aim of quadratic Fourier analysis is to carry out a similar decomposition for 
the [/^-norm. That is, given a function /, we would like to write / as a sum /i + /2, where 
/i is "structured" and /2 is quadratically uniform. Now this is a stronger (in fact, much 
stronger) property to demand of /2, so we are forced to accept a weaker notion of structure 
for /i. 

Obtaining any sort of structure at all is significantly harder than it is for the ?7^-norm, 
and results in this direction are much more recent. The first steps were taken in [Gow98j 
and [GowOlj for the group Zjv in order to give an analytic proof of Szemeredi's theorem. 
The structure of that proof was as follows: Theorem 12.21 (of the present paper) can be 
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used to show that if a set A is sufficiently uniform of degree k — 2, then it must contain an 
arithmetic progression of length k. Then an argument that is fairly easy when k = 3 but 
much harder when k > 4 can be used to show that if A is not c-uniform of degree k, then 
it must have "local correlation" with a function of the form uj^^^\ where cu = exp2iTi/N 
and is a polynomial of degree d. "Local" in this context means that one can partition 
Ztv into arithmetic progressions of size A^'' (for some t] that depends on c and k only) on 
a large proportion of which one can find such a correlation. 

This was strong enough to prove Szemeredi's theorem, but for several other applications 
the highly local nature of the correlation is too weak. However, in the quadratic case, 
this problem has been remedied by Green and Tao jGTOSaj . In this case, the obstacle to 
"globalizing" the argument is that a certain globally-defined bilinear form that occurs in the 
proof of [GowOlj is not symmetric, and thus does not allow one to define a corresponding 
globally-defined quadratic form. (In the context of Zjy, "global" means something like 
"defined on a proportional-sized Bohr set". For one can take it to mean "defined 
everywhere".) Green and Tao discovered an ingenious "symmetry argument" that allows 
one to replace the bilinear form by one that is symmetric, and this allowed them to prove 
a quadratic structure theorem for functions with large t/^-norm that is closely analogous 
to the linear structure theorem that follows from conventional Fourier analysis. 

An excellent exposition of this structure theorem when the group G is a vector space 
over a finite field can be found in |Gre05bj . This contains proofs of all the background 
results that we state here. 

Recall that in the linear case, we called /i "structured" because it was constant on affine 
subspaces of low codimension. For quadratic Fourier analysis, we need a quadratic analogue 
of the notion of a decomposition of into parallel affine subspaces of codimension di. In 
order to define such a decomposition, one can take a surjective linear map Fi : F^ — F^^ 
and for each a E F^^ one can set Va to equal r^^({a}). If we want to make this idea 
quadratic, we should replace the linear map Fi by a "quadratic map" F2, which we do in 
a natural way as follows. We say that a function F2 : F^ — > F^^ is quadratic if it is of the 
form X (-^ (gi(x), . . . , qdii^)): where gi, . . . , are quadratic forms on F^. Then, for each 
b e ¥p we define Wb to be {xeW^: F2(x) = b}. 

In |GT05bj . Green and Tao define Bi to be the algebra generated by the sets Va and 
B2 for the finer algebra generated by the sets Va fl Wf,. They call Bi a linear factor of 
complexity di and {Bi, B2) a quadratic factor of complexity [di, ^2)- This is to draw out a 
close analogy with the "characteristic factors" that occur in ergodic theory. 
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These definitions give us a suitable notion of a "quadratically structured" function — it 
is a function fi for which we can find a hnear map Fi : F^^ and a quadratic map 

T2 :¥p ^ Fp2 such that di and ^2 are not too large and /i is constant on the sets Va fl Wb 
defined above. This is equivalent to saying that /i is measurable with respect to the algebra 
B2, and also to saying that /i(x) depends on (ri(x), r2(x)) only. 

The quadratic structure theorem of Green and Tao implies that a bounded function / 
defined on F^ can be written as a sum /i + /2, where /i is quadratically structured in the 
above sense, and II/2IIC/3 is small. In |GT05bj the result is stated explicitly for p = 5, but 
this is merely because of the emphasis placed on 4-term progressions. The proof is not 
affected by the choice of p (as long as it stays fixed). 

In the statement below, we write E(/|i32) for the conditional expectation, or averaging 
projection, of /. That is, if X = 14 fl Wb is an atom of B2 and x E X, then E.{f\B2){x) is 
the average of / over X. Since the function K{f\B2) is constant on the sets Va fl Wb, it is 
quadratically structured in the sense that interests us. 

Theorem 3.3. |GT05bj Let p be a fixed prime, let 6 > and suppose that n > no(^) 
is sufficiently large. Given any function / : — [—1, 1], there exists a quadratic factor 
{Bi,B2) of complexity at most {{4:S~^Y^°~^^ , {4:6~^y""~^^) together with a decomposition 

f = fl + /2, 

where 

/i:=E(/|-B2) and \\f2\\m<S. 
The absolute constant Cq can be taken to be 2^^. 

As it stands, the above theorem is not quite suitable for applications, because technical 
problems arise if one has to deal with quadratic forms of low rank. (Notice that so far 
we have said nothing about the quadratic forms qi — not even that they are distinct.) Let 
^2 = {qi, ■ ■ ■ , Qk) be a quadratic map and for each i let jSi be the symmetric bilinear form 
corresponding to qf that is, l3i{x,y) = {qi{x + y) — qi{x) — qi{y))/2. We shall say that r2 is 
of rank at least r if the bilinear form Y2i ^iPi has rank at least r whenever Ai, . . . , A^j are 
elements of ¥p that are not all zero. If r2 is used in combination with some linear map Ti 
to define a quadratic factor [Bi,B2), then we shall also say that this quadratic factor has 
rank at least r. 

Just to clarify this definition, let us prove a simple lemma that will be used later. 
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Lemma 3.4. Let (3 he a symmetric bilinear form of rank r on and let W be a subspace 
o/Fp of codimension di. Then the rank of the restriction of (3 to W is at least r — 2di. 

Proof. Let V = ¥p. For every subspace W of V, let us write W-^ for the subspace 

{v E V : l3{v,w) = for every w G W}. 

Let us define the nullity of /5 to be the dimension of V-^. Then the rank of (3 is equal to n 
minus its nullity, which is the codimension of V-^. We are assuming that this is r. 

Now let W have codimension di. We begin by bounding from above the dimension of 
W-^. To do this, let F be a complement for W, which, by hypothesis, will have dimension 
di- Then V-^ = W-^ fl Y-^ and y-*" has dimension at least n — di, so the dimension of 
W-^ is at most di + dim(y-'-), which, by hypothesis, is at least di + n — r. Therefore, the 
codimension of W-^ is at most r — di, which implies that the codimension of W-^ inside W 
is at most r — 2di. This implies the result. □ 

We are now in a position to state the version of the structure theorem that we shall be 
using. It can be read out of (but is not explicitly stated in) [GreOSbj and |GT05b] . 

Theorem 3.5. Let p be a fixed prime, let 6 > 0, let r : N ^ N an arbitrary function 
(which may depend on 6) and suppose that n > no{r,6) is sufficiently large. Then given 
any function / : F^ ^ [—1,1], there exists d^ = dQ{r,6) and a quadratic factor {Bi,B2) 
of rank at least r{di + ^2) and complexity at most (di, ^2), rfi, (^2 ^ d^, together with a 
decomposition 

/ = /i + /2 + /s, 

where 

f,:=E{f\B2), \\f2h<S and H/sHt/a < 5. 

Note that E/i = E/. In particular E/i = whenever / is the balanced function of a 
subset of Fp. It can be shown that fi is uniform whenever / is uniform: roughly speaking, 
the reason for this is that E(/|i3i) is approximately zero and the atoms of B2 are uniform 
subsets of the atoms of Bi. However, we shall not need this fact. 

We shall apply Theorem 13.51 when r is the function d ^ 2md + C for a constant C. 
Unfortunately, ensuring that factors have high rank is an expensive process: even for 
this modest function the argument involves an iteration that increases do exponentially at 
every step. For this reason we have stated the theorem in a qualitative way. A quantitative 
version would involve a tower-type bound. 
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3.3. Square-independence is sufficient. We now have the tools we need to show that 
square-independence coupled with CS-complexity 2 is sufficient to guarantee the correct 
number of solutions in uniform sets. The basic idea of the proof is as follows. Given a set 
A G¥p of density a, we ffist replace it by its balanced function f{x) = A{x) — a. Given a 
square-independent linear system C of complexity at most 2, our aim is to show, assuming 
that ||/||;72 is sufficiently small, that 

m 

i=l 

is also small. (Once we have done that, it will be straightforward to show that the same 
average, except with A replacing /, is close to In order to carry out this estimate, we 
first apply the structure theorem to decompose / as /i + /2 + /a, where fi is quadratically 
structured, /2 is small in L2 and /s is quadratically uniform. This then allows us to 
decompose the product into a sum of 3™" products, one for each way of choosing /i, /2 or 
/s from each of the m brackets. If we ever choose /2, then the Cauchy-Schwarz inequality 
implies that the corresponding term is small, and if we ever choose /a then a similar 
conclusion follows from Theorem 12.31 Thus, the most important part of the proof is to use 
the linear uniformity and quadratic structure of /i to prove that the product 

m 
i=l 

is small. This involves a calculation that generalizes the one we used to prove Theorem 
13.11 The main step is the following lemma, where we do the calculation in the case where 
the linear factor Bi is trivial. 

Lemma 3.6. Let C = (Li, . . . , Lm) be a square-independent system of linear forms and let 
^2 = [qi, ■ ■ ■ ,(ld2) be a quadratic map from to F^^ of rank at least r. Let (pi, ... , (pm be 
linear maps from (F^)"' to F^^ and let bi, . . . ,bm be elements of F^^ . Let x = [xi, . . . , Xd) 
be a randomly chosen element of (Fp)*^. Then the probability that r2(Lj(x)) = 0i(x) + bi 
for every i differs from p~™^2 jjy most p~^^'^. 

Proof. Let A be the set of all m x d2 matrices A = (Xij) over ¥p and let us write (pi = 
{(pii, . . . , (pid2) and bi = {bn, . . . , bid^) for each i. The probability we are interested in is the 
probability that qj{Li{-x)) = 0jj(x) + bij for every i < m and every j < d2. This equals 

i=l j=l 
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since if qj{Li{x.)) = (pij^x.) + bij for every i and j, then the expectation over A is 1, and 
otherwise if we choose i and j such that gj(Lj(x)) 7^ (f)ij{-x)+bij and consider the expectation 
over Xij while all other entries of A are fixed, then we see that the expectation over A is 
zero. 

We can rewrite the above expectation as 

If A = 0, then obviously the expectation over x is 1. This happens with probability 
Otherwise, for each i let us say that the coefficients of Lj are Cn, . . . ,Cid- That is, let 
Li{x) = Y.t=i'^iuXu- Then 

u,v 

where jSj is the bilinear form associated with qj. Choose some j such that Xij is non-zero 
for at least one i. Then the square-independence of the linear forms Li implies that there 
exist u and v such that J2i ^ijCiuCiv is not zero. 

Fix such a. j, u and v and do it in such a way that u = v, if this is possible. We shall 
now consider the expectation as Xu and Xy vary with every other fixed. Notice first that 

KjQj{Li(x.)) = '^^''^^ XijCitCiyj(3j{xt, Xyj). 

i,j i,j t,w 

Let us write Ptw for the bilinear form J2i j KjCaCiwPj, so that this becomes ^ Ptw{xt, x^)- 
Let us also write 0(x) for J^ij Kj4>ij(?^) and let (pi, . . . , (pd be linear maps from E^ to Fp 
such that 0(x) = J^t'^ti^t) every x. Then 

^Xij{qj{Li(x.)) - 0ij(x)) = ^Ptw{xt,Xui) - y^(/)f(xf). 

i,j t.w t 

Notice that if we cannot get u to equal v, then Y2- Xijcf^ = for every u and every j, which 
implies that = 0. Notice also that the assumption that r2 has rank at least r and the 
fact that Yli KjCiuCiw 7^ for at least one j imply that Puv has rank at least r. 

If we fix every Xt except for Xu and x^, then "^^^ Ptw{xt,Xyj) — Ylt't'ti.Xt) is a function 
of Xu and x^ of the form 

where ipu and ipy are linear functionals on E^ (that depend on the other Xt)- 
Now let us estimate the expectation 

W iJ^i,] ^i3{Q]{Liiyi.))-<lyijix)-bij) 
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where we have fixed every xt apart from x„ and x„. Letting b = Yl ^ijbij and using the 
calculations we have just made, we can write this in the form 

"117 , ,f3uv{Xu,Xv)+4'u{Xu)+tpviXv)-b 

li u = V, then the expectation is just over Xu and the exponent has the form q{xu) + w'^u — h 
for some quadratic form q of rank at least r. Therefore, by Lemma 13.21 the expectation 
is at most p"''/^. \i u ^ v (and therefore every huu is zero) then for each x^ the exponent 
is linear in u. This means that either the expectation over Xu is zero or the function 
(Xu) is constant. If the latter is true when x^ = y and when x^ = z, then 
Puv{xu,y — z) is also constant, and therefore identically zero. Since (3uv has rank at least 
r, y — z must lie in a subspace of codimension at least r. Therefore, the set of Xy such 
that Puv{xu, Xy) + ipu{.Xu) is constant is an affine subspace of of codimension at least r, 
which implies that the probability (for a randomly chosen x^) that the expectation (over 
Xu) is non-zero is at most p"''. When the expectation is non-zero, it has modulus 1. 
In either case, we find that, for any non-zero A G A, the expectation over x is at most 
and this completes the proof of the lemma. □ 

We now want to take into account Fi as well as This turns out to be a short deduction 
from the previous result. First let us do a simple piece of linear algebra. 

Lemma 3.7. Let L = {Li,...,Lm) be a collection of linear forms in d variables, and 
suppose that the linear span of Li, . . . , Lm has dimension d' . Let Fi : — * F^^ be a 
surjective linear map and let : (F^)'^ (F^^)™ be defined by the formula 

0:xh^(Fi(Li(x)),...,Fi(L^(x))). 

Then the image of (p is the subspace Z of (F^^)"* that consists of all sequences (ai, . . . , am) 
such that Yi^iiai = whenever Ylif^i^i — 0- dimension of Z is d'di. 

Proof. Since the m forms Li span a space of dimension d' , the set of sequences /i = 
(/ii, . . . ,/im) such that J^if^i^i = is a linear subspace W of F™ of dimension m — d'. 
Therefore, the condition that = every sequence n eW restricts (ai, . . . , a^) 

to a subspace of (F^^)"^ of codimension di{m — d'). (An easy way to see this is to write 
Qi = {an, . . . , OidJ and note that for each j the sequence {aij, . . . , amj) is restricted to a 
subspace of codimension m — d'.) Therefore, the dimension of Z is d'di, as claimed. 

Now let us show that Z is the image of 0. Since is linear, Z certainly contains the 
image of 0, so it will be enough to prove that the rank of is d'di. 
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Abusing notation, let us write ri(x) for the sequence (FiXi, . . . , TiXd), which belongs to 
(Fpi)*^. Then 0(x) can be rewritten as (Li(ri(x)), . . . , Lm(ri(x))). Since Fi is a surjection, 
it is also a surjection when considered as a map on (F^)'^. Therefore, the rank of (p is the 
rank of the map i; : (F^^)'^ (F^i)'" defined by 

^fj-.y^ (Li(y),...,L„(y)). 

Since the Lj span a space of dimension d', the nullity of this map is di{d — d'), so its rank 
is did'. Therefore, the image of (f) is indeed Z. □ 

Lemma 3.8. Let C = (Li, . . . , Lm) be a square-independent system of linear forms in d 
variables, and suppose that the linear span o/Li, . . . , L^, has dimension d' . Let Fi : F^ — » 
Fpi be a surjective linear map and let F2 : F^ — > F^^ be a quadratic map of rank at least 
r. Let ai, . . . ,am be elements o/F^^ and let bi, . . . ,bm be elements of¥p, and let and Z 
be as defined in the previous lemma. Then the probability, if x is chosen randomly from 
{¥pY, that Fi(Li(x)) = and F2(Lj(x)) = bi for every i <m is zero if (ai, . . . , am) G Z , 
and otherwise it differs from ^-'^I'^'-'^am rnost p'i-^-<i'di~r/2 ^ 

Proof. If a = («!,..., Om) ^ Z, then there exists /i G F™ such that Yliil^i'^i 7^ ^^'^ 
Y^-fiiLilx.) = for every x. Since Fi is linear, it follows that there is no x such that 
Fi(Lj(x)) = tti for every i. 

Otherwise, by Lemma [3.71 a lies in the image of (p, which has rank d'di, so 0^^({a}) is 
an affine subspace of (F^)"^ of codimension d'di. Therefore, the probability that 0(x) = a 
is p~'^''^^ . Now let us use Lemma 13.61 to estimate the probability, conditional on this, that 
F2(Lj(x)) = bi for every i < m. 

In the proof of Lemma [3.7[ we observed that 0(x) depends on Fi(x) only, so we shall 
estimate the required probability, given the value of Fi(x). (Recall that this is notation for 
(Fi^i, . . . , TiXd).) In order to specify the set on which we are conditioning, let V be the 
kernel of Fi (considered as a map defined on F^), and given a sequence {wi, . . . , Wd) G (Fp)'^, 
let us estimate the required probability, given that Xu & V + Wu for every u. 

Let us write Xu = Vu + Wu. Thus, we are estimating the probability that F2(Li(y + w)) = 
bi for every i < m. But for each i we can write F2(-Lj(y + w)) as F2(-L.t(y)) + 0i(y) + b'i for 
some linear function 0^ : — > F^^ and some vector b'^ G F^^. 

Because F2 has rank at least r and the codimension of V in F^ is di, Lemma [3.41 implies 
that the rank of the restriction of F2 to V is at least r — 2di. Therefore, by Lemma [3.61 
the probability that F2(-Lj(y)) = — 0i(y) + 6j — 6 • for every i differs from p~"^'^2 by at most 

di-r/2 
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Since this is true for all choices of w, we have the same estimate if we condition on the 
event that 0(x) = a for some fixed a G Therefore, the probability that ri(Li(x)) = Oj 
and T2{Li(x.)) = hi for every i differs fromp"'^''^^"'^'^^ by at most p'^i-'^''^i-''/2^ g^g claimed. □ 

Next, we observe that Lemma [3.81 implies that all the atoms of B2 have approximately 
the same size. 

Corollary 3.9. Let Ti and T2 he as ahove and let x he a randomly chosen element ofW^. 
Then for every a e F^^ and every h G F^^^ the prohahility that Ti{x) = a and T2{x) = h 
differs from p~'^i"'^2 most p"*"/^. 

Proof. Let us apply Lemma [3l8] in the case where C consists of the single one- variable linear 
form L{x) = x. This has linear rank 1 and is square-independent, so when we apply the 
lemma we have d' = m = 1. If we let ai = a and hi = h, then the conclusion of the lemma 
tells us precisely what is claimed. □ 

The next two lemmas are simple technical facts about projections on to linear factors. 
The first one tells us that if g is any function that is uniform and constant on the atoms 
of a linear factor, then it has small L2-norm. The second tells us that projecting on to a 
linear factor decreases the U^-norm. 

Lemma 3.10. Let G he a function from F^^ to [—1, 1] , let Fi : F^ ^ F^^ he a surjective 
linear map and let g = G o Ti. Then \\g\\2 < ^'"'Hlfi'l 



4 

C/2- 



Proof. Since Fi takes each value in F^^ the same number of times, \\g\\u^ = \\G\\u2. But 



\\G\\fj,=Ea{E,G{h)Gih + a)y>p-''^{E,Gih)y=p-'^\\G\\i 



which proves the result, since \\g\\2 = \\G\\2 as well. □ 

Lemma 3.11. Let f he a function from F^ to M, let Bi he a linear factor and let g = 
E{f\Bi). Then \\g\\u2 < WfWu^. 



Proof. On every atom oi Bi, g is constant and f — g averages zero. Let Fi be the linear 
map that defines Bi and, as we did earlier, for each a G F^^ let Va stand for F]^^({a}). 
Then 

WfWU = IEai+a2=a3+a4E^i+x2=-3+-4/(Xl)/(x2)/(a;3)/(x4) . 

Let us fix a choice of ai+a2 = 03 +04 and consider the inner expectation. Setting g' = f—g, 
this has the form 



E.i+:.2=-3+-4 (Ai + g'{xi)){X2 + g'{x2)){X3 + fi''(a;3))(A4 + g'ix^)) 
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This splits into sixteen parts. Each part that involves at least one Aj and at least one g'{xi) 
is zero, because any three of the XjS can vary independently and g' averages zero on every 
atom of Bi. This means that the expectation is 

A1A2A3A4 + E.i+:.2=-3+-4fl''(a;i)fi''(x2)fi''(x3)5('(x4) . 

If we now take expectations over ai + 02 = 03 + 04 we find that ||/||^2 = ||5'||^2 + ||/ — 5'||^2. 
Notice that this is a general result about how the t/^-norm of a function is related to the 
?7^-norm of a projection on to a linear factor. □ 

Now we are ready to estimate the product we are interested in, for functions that are 
constant on the atoms of B2- 

Lemma 3.12. Let Fi : ^ F^^ he a linear function and r2 : F^ — > F^^ he a quadratic 
function. Let B2) he the corresponding quadratic factor and suppose that this has rank 
at least r. Let c > and let / : F^ ^ 1] be a function with ||/||t/2 < c and let 
fi = ]E[f\B2)- Let C = (Li, . . . , Lm) he a square-independent system of linear forms. Then 

m 
i=l 

Proof Let g = E(/i|Si) and let h = fi-g. Then < ||^||2 < /^/^b||c/2, by Cauchy- 
Schwarz and Lemma I3.10[ By Lemma 13.111 ||5'||(72 < ll/llc/^j which is at most c, by 
hypothesis. Therefore, \\g\\i < cp"'^/^. 

Since /i = g + h, we can split the product up into a sum of 2*" products, in each of which 
we replace /i(Lj(x)) by either g{Li{x)) or /i(Lj(x)). Since \\g\\i < cp'^^^^ and ||/i||oo < 2, any 
product that involves at least one g has average at most 2'"c/i/^. It remains to estimate 

m 
i=l 

Let Z be as defined in Lemma 13.71 and for each a = (ai, . . . , a^) and b = (61, ... , bm), 
let P(a, b) be the probability that ri(Lj(x)) = Oj and r2(-Li(x)) = bi for every i. By 
LemmaEJl we can set P(a,b) =p^d'd^~md2 +e(a,b), with |e(a,b)| < pd^~d'd^~r/2 _ 

Now let H be defined by the formula h{x) = HiTix, T2X). Because h is constant on the 
atoms of ^2, H is well-defined on the set of all elements of F^^ x F^^ of the form (Fix, r2x). 
Since h takes values in [—2, 2], so does H. 

Next, we show that KhH{a,b) is small for any fixed a G F^^, using the facts that h 
averages on every cell of Bi and that it is constant on the cells of B2- Let us fix an a 
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and write P{b) for the probability that r2(x) = b given that ri(a;) = a — that is, for the 
density of Va fl Wb inside V^. Then 

= E,eyXx) = E^^VaHiTix, T2X) = J2 P{b)H{a, h). 

b 

By Corollary 13.91 we can write P{b) = p''^^ + e(6), with |e(6)| < /i-'^/2 for every 0. 
Therefore, the right-hand side differs from E^if (a, b) by at most 2p'^i +"^2-^/2^ which implies 
that \EbH(a,b)\ < 2p'^^+'^^-''/^. 
Now 

mm m 

Ex n h{Li{^)) = Ex n ^(ri(i:,(x)), r2(Li(x))) = Y,Y1 b) n ^(«^' ^*)- 

1=1 i=l aeZ b i=l 

Let us split up this sum as 

m m 
aeZ b i=l aeZ h i=l 

The first term equals Ma^z Y[iLi{^bH{ai, b)), which is at most (^2p'^^~^'^'^~'''^'^)^. The second 
is at most pid'di+md2)2mpdi-d'di-r/2 ^ 2"yi+'"'^2-r/2_ Therefore, the whole sum is at most 
2m+ipm[di+d2)-r/2 ^ Together with our estimate for the terms that involved g, this proves 
the lemma. □ 

We have almost finished the proof of our main result. 

Theorem 3.13. For every e > there exists c > with the following property. Let 
/ : Ep — s> [—1, 1] be a c-uniform function. Let C = (Li, . . . ,Lm) be a square-independent 
system of linear forms in d variables, with Cauchy-Schwarz complexity at most 2. Then 



i=l 



Proof. Let 5 > be a constant to be chosen later. Let C be such that 2™+^p~'^/2 < e/3 and 
let r be the function d 1— 2md + C. Then according to the structure theorem (Theorem 
13. 5p there exists do, depending on r and 6 only, and a quadratic factor B2) of rank at 
least 2m{di + ^2) + C and complexity (c/i, ^2), with di and d2 both at most do, such that 
we can write / = + /2 + /g, with /i = E(/|i32), II/2II2 < S and H/sHt/a < S. 

Let us show that the sum does not change much if we replace /(Lm(x)) by /i(Lm(x)). 
The difference is what we get if we replace /(Lm(x)) by /2(Lm(x)) + /3(Lm(x)). Now 
II/2II1 < II/2II2 and ll/lloo < 1, so the contribution from the /2 part is at most 6. As for the 
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/s part, since H/sHi/s < 6 and ||/||oo < 1, Theorem 12.31 tells us that the contribution is at 
most S. Therefore, the total difference is at most 6 + 6 < 26. 

Now let us replace / by /i in the penultimate bracket. The same argument works, since 
1 1 /i I loo < 1- Indeed, we can carry on with this process, replacing every single / by /i, and 
the difference we make will be at most 2m6. 

We are left needing to show that the product with every / replaced by fi is small. This is 
what Lemma [2112] tells us. It gives us an upper bound of 4'"cp'^^/^+2'""'"^p™'*^'^i"'"'^2)-»'/2^ where 
for r we can take 2m(di + ^2) + C. Therefore, the upper bound is ^'^cp'^o/i _|_ 2m+ip-c/2^ 
which, by our choice of C, is at most A'^cp'^'^^^ + e/3. 

To finish, let 6 = e/6m. This determines the value of do and we can then set c to be 
4~'"p~'^"/^e/3, which will be a function of e only. □ 

Because of our use of Theorem 13.51 the bounds in the above result and in the corollary 
that we are about to draw from it are both very weak. However, we have been explicit 
about all the bounds apart from do, partly in order to make it clear how the parameters 
depend on each other and partly to demonstrate that our weak bound derives just from 
the weakness of do in the structure theorem. 

Corollary 3.14. For every e > there exists c > with the following property. Let A 
be a c-uniform subset o/F^ of density a. Let C = {Li, . . . , L^) be a square-independent 
system of linear forms in d variables, with Cauchy-Schwarz complexity at most 2. Let 
X = (xi, . . . ,Xd) be a random element of (Fp'^. Then the probability that Lj(x) G A for 
every i differs from by at most e. 

Proof. We shall choose as our c the c that is given by the previous theorem when e is 
replaced by e/2™. Our assumption is then that we can write A = a + f ior a. c-uniform 
function /. The probability we are interested in is 

m 
i=l 

which we can split into 2™ parts, obtained by replacing each occurrence of A either by a 
or by /. 

For each part that involves at least one occurrence of /, we have a power of a multiplied 
by a product over some subsystem of C. This subsystem will also be square-independent 
and have CS-complexity at most 2. Moreover, the number of linear forms will have de- 
creased. Therefore, the previous theorem and our choice of c tell us that the contribution 
it makes is at most e/2™. Therefore, the contribution from all such parts is at most e. 
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The only remaining part is the one where every A(Lj(x)) has been replaced by a, and that 
gives us the main term a™. □ 



4. Concluding remarks 

First, we remark that Corollary 13.141 allows us to deduce rather straightforwardly a 
Szemeredi-type theorem for square-independent systems of CS-complexity 2 which have 
the additional property that they are translation-invariant. That is, one can show that 
any sufficiently dense subset of contains a configuration of the given type. 

Without the result of the preceding section, establishing that any sufficiently dense subset 
contains a solution to systems of this type would require a quadratic argument of the form 
used by Green and Tao to prove Szemeredi's Theorem for progressions of length 4 in finite 
fields |GT05bj . This would involve obtaining density increases on quadratic subvarieties 
of Fp, which then need to be linearized in a carefully controlled manner. Although it is 
certainly possible to adapt their argument in this way, for purely qualitative purposes it 
is much simpler to use the result that configurations of this type are governed by the f/^- 
norm, which allows one to produce a density increase on an affine subspace. The resulting 
argument is almost identical to the well-known argument for 3-term progressions |Mes95j . 
Translation invariance is needed because the subspace on which we find a density increment 
may be an affine and not a strictly linear one. (It is not hard to show that the result is false 
if the system is not translation invariant.) Recall that two examples of square- independent 
translation invariant systems of complexity 2 are the systems {x,x + y,x + z,x + y + z,x + 
y — z,x + z — y) and {x,x + a,x + b,x + c,x + a + b,x + a + c,x + b + c). 

The second of these examples shows that our main result implies the following useful 
"Pythagorean theorem," which generalizes the much more straightforward fact that if a is 
a constant and / averages zero, then ||a + /||^2 = + 11/11^2- 
Theorem 4.1. For every e > there exists c > such that if f is a c-uniform function 
from Fp to [—1, 1], a G [—1, 1] is a constant, and g{x) = a + f{x) for every x G F^, then 
|lkll?.3-(a«+||/r^3)| <e. 

We briefiy sketch the proof: expanding out the definition of ||a, + /||^3 one obtains a sum 
of 2^ terms, one of which gives (if you choose a from every bracket) and one of which 
gives 11/11^3 (if you choose / from every bracket). All the remaining terms are constant 
multiples of expectations of / over linear configurations that are square-independent and 
therefore, by our main result, small. 
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There are several ways in which the results of Section 3 might be generalized. An obvious 
one is to prove comparable results for the group Zat. As we mentioned earlier, we have 
a different proof for and this can be transferred to Zat by "semi-standard" methods. 
(That is, the general approach is clear, but the details can be complicated and sometimes 
require more than merely technical thought.) The alternative proof for gives a doubly 
exponential bound for the main result rather than the tower-type bound obtained here. 

Possibly even more obvious is to try to extend the main result of this paper to a proof 
of Conjecture 12. 5[ This involves a generalization in two directions: to systems of CS- 
complexity greater than 2, and to systems with true complexity greater than 1. All further 
cases will require polynomial Fourier analysis for a degree that is greater than 2: the 
simplest is likely to be to show that a square-independent system with CS-complexity 
3 has true complexity 1. In this case, we would use a decomposition into a structured 
part (a projection onto a cubic factor) and a uniform part (which would be small in 
and therefore negligible) and then, as before, concentrate on the structured part. Square- 
independence (which implies cube-indepence) would ensure that we could reduce to the 
linear part of the factor as before. 

This state of affairs leaves us very confident that Conjecture 12.51 is true. Although cubic 
and higher-degree Fourier analysis have not yet been worked out, they do at least exist in 
local form for Zat: they were developed in [GowOlj to prove the general case of Szemeredi's 
theorem. It is therefore almost certain that global forms will eventually become available, 
both for Ztv and for F^. And then, given a statement analogous to Theorem 13.51 it is 
easy to see how to generalize the main steps of our proof. In particular, the Gauss-sum 
estimates on which we depend so heavily have higher-degree generalizations. 

A completely different direction in which one might consider generalizing the above 
results is to hypergraphs. For example, very similar proofs to those of Theorems 12.21 and 
12.31 can be used to prove so-called "counting lemmas" for quasirandom hypergraphs — 
lemmas that assume that a certain norm is small and deduce that the hypergraph contains 
approximately the expected number of small configurations of a given kind. 

One can now ask whether, as with sets, weaker quasirandomness assumptions about a 
hypergraph suffice to guarantee the right number of certain configurations, and if so, which 
ones. It turns out to be possible to give a complete answer to a fairly natural formulation 
of this question. Unfortunately, however, the proof is rather too easy to be interesting, 
so here we content ourselves with somewhat informal statements of results concerning the 
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special case of 3-uniform hypergraphs. The proofs we leave as exercises for any reader who 
might be interested. 

Recall that if X, Y and Z are finite sets and f : X x Y x Z ^ M., then the octahedral 
norm of / is the eighth root of 

Ea;(o),x{i)exEy(o),y{i)eyIE^(o),^(i)gz Y\. /(^(^i)' ^(^2), ^(ea))- 

It is easy to verify that ifX = Y = Z = G for some Abelian group G and f{x, y, z) = 
g{x + y + z) for some function g, then the octahedral norm of / is the same as the U^- 
norm of g. Therefore, it is natural to consider the octahedral norm of functions defined on 
X xY X Z a.s the correct analogue of the f/^-norm of functions defined on Abelian groups. 

An important fact about the octahedral norm is that / has small octahedral norm if 
and only if it has a small correlation with any function of the form u{x,y)v{y, z)w{x, z). 
Another important fact, the so-called "counting lemma" for quasirandom hypergraphs, 
states the following. Let X be a finite set and let if be a 3-uniform hypergraph with 
vertex set X and density a. Suppose that H is quasirandom in the sense that the function 
H{x,y,z) — a has small octahedral norm (where H{x,y,z) = 1 if {x,y,z} G H and 
otherwise). Then H has about the expected number of copies of any fixed small hypergraph. 
For instance, if you choose x, y, z and w randomly from X, then the probability that all 
of {x, y, z}, {x, y, w}, {x, z, w} and {y, z, w} belong to H is approximately a^. 

Now let us suppose that g is uniform but not necessarily quadratically uniform, and 
that we again define f{x,y,z) to be g{x + y + z). What can we say about /? It is not 
necessarily the case that / has small octahedral norm, or that it has low correlation with 
functions of the form u{x, y)v{y, z)w{x, z). However, it is not hard to show that it has low 
correlation with any function of the form a{x)b{y)c{z), a property that was referred to as 
vertex uniformity in jGow04j . 

One might therefore ask whether vertex uniformity was sufficient to guarantee the right 
number of copies of some small hypergraphs. However, well-known and easy examples 
shows that it does so only for hypergraphs such that no pair {x, y} is contained in more 
than one hyperedge. For instance, let m be a random symmetric function from X"^ to 
{ — 1, 1} and let H{x, y, z) = {3 + u{x, y) + u{y^ z) + z))/Q. Then H is vertex uniform 
and has density 1/2, but it is a simple exercise to show that ¥.x^y^z,wH{x,y, z)H{x,y,w) is 
about 5/18 instead of the expected 1/4. 

However, this is perhaps not the right question to be asking. If g is uniform, then / has 
a stronger property than just vertex uniformity: one can prove that it does not correlate 
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with any function of the form u{x,y)w{x, z), u{x,y)v{y, z) or v{y, z)w{x, z). If we take 
this as our definition of "weak quasirandomness" for functions (and call the hypergraph 
H weakly quasirandom if the function H — a is) , then which hypergraphs appear with the 
right frequency (or with "frequency zero" if we are talking about functions rather than 
sets)? The answer turns out to be that a sum over copies of a small hypergraph H' will 
have the "right" value if and only if there is a pair {x, y} that belongs to exactly one 
hyperedge {x, y, z} of H'. The proof in the "if" direction is an easy exercise. In particular, 
it does not involve any interesting results about decomposing hypergraphs, which suggests 
that the main result of this paper is, in a certain sense, truly arithmetical. 

As for the "only if" direction, here is a quick indication of how to produce an example 
(in the complex case, for simplicity). Suppose that no pair {x, y} belongs to more than m 
hyperedges in H'. For each k between 2 and m let fk '■ X"^ ^ C be a function whose values 
are randomly chosen kth roots of unity. Then let f{x, y, z) be the sum of all functions of 
the form u(x, y)v{y, z)w{x, z), where each of u, v and w is some fk with 2 < k < m. When 
one expands out the relevant sum for this function /, one finds that most terms cancel, but 
there will be some that don't and they will all make a positive contribution. To find such 
a term, the rough idea is to choose for each face F oi H' a, triple of functions (/^j, /^j, fk^), 
where ki, k2 and ^3 are the number of faces of H' that include each of the three edges that 
make up the face F. For this term, each time a kth root of unity appears in the product, 
it is raised to the power k, so the term is large. 

Finally, let us say just a little bit more about the result of Leibman mentioned at the 
beginning of the paper. The question in ergodic theory which is analogous to the one we 
were studying in Section 3 concerns so-called "characteristic factors" for ergodic averages 
of the form 



where T is a measure-preserving map on a probability measure space (X, B, /i) and the 
functions fi belong to L°°{fi). A characteristic factor is a system onto which one can 
project without losing any quantitative information about the average under consideration. 
The aim is to find characteristic factors which possess enough structure to allow one to 
establish convergence of the above average in a rather explicit way. For example, it was 
shown by Host and Kra jHK05j and Ziegler |Zie07j independently that when the linear forms 
Li, Ljn describe an arithmetic progression of length m, then there exists a characteristic 




/ 



T^^(")/i(a;) T^^("V2(x) ... T^-(")/m(a;)ci/^(x), 



ni,...,na=l 
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factor for the average which is isomorphic to an inverse hmit of a sequence of (m — 2)- 
step nilsystems. For m = 4, these very structured objects are closely related to the 
quadratic factor we are using in this paper, on which computations can be performed 
rather straightforwardly. After these remarks it should not be surprising that there is 
a notion of degree associated with a characteristic factor. What we have called the true 
complexity of a linear system is closely analogous to the degree of the minimal characteristic 
factor. 

Leibman |Lei07j characterizes the degree of the minimal characteristic factor for general 
linear as well as certain polynomial systems. As his definition of complexity in the ergodic 
context is highly technical, we shall simply illustrate the analogy with our result by quoting 
two examples from Section 6 of his paper: In our terminology, both of the systems given 
by {x + n + m, X + 2n + 4m, x + 3n + 9m, x + 4n + 16m, x + 5n + 25m, x + Qn + 36m) 
and the ever so slightly different {x + n + m, x + 2n + 4m, x + 3n + 9m, x + An + 16m, x + 
5n + 25m, x + 6n + 37m) have CS-complexity 2. However, the second one has true com- 
plexity 1 since its squares are independent, or, as Leibman puts it, because the six vectors 
(1,1,1,1,1,1), (l,ci,C2, . . . ,C5), (l,c?i,ci2, • • • ,4), (l,c?,c|, . . . ,c^), {1, d^, . . . , dj) and 
(1, ci^i, 02(^2, . . . , c^d^) span M^. (Here q, di are the cofficients of n, m, respectively, in the 
linear form i + 1. Note that the special form of the ergodic average forces one to consider 
translation-invariant systems only, which leads to a formulation of square-independence 
that is particular to systems where one variable has coefficient 1 in all linear forms.) 

In his proof of Szemeredi's Theorem, Furstenberg [Fur77] developed an important tool 
known as the correspondence principle which allowed him to deduce Szemeredi's combina- 
torial statement from the recurrence properties of a certain dynamical system. Our result 
in the Z^v case does not appear to follow from Leibman's result by a standard application 
of the correspondence principle. For an excellent introduction to ergodic theory and its 
connections with additive combinatorics, we refer the interested reader to |Kra06] . 
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